Multiscale Trend Analysis 



Ilya Zaliapin * 
Andrei Gabrielov J and 
Vladimir Keilis-Borok* 

May 04, 2003 



*Institute of Geophysics and Planetary Physics, University of California, Los Angeles, CA 90095- 
1567, USA and International Institute of Earthquake Prediction Theory and Mathematical Geophysics, 
Russian Academy of Sciences, Moscow, Russia, E-mail: zal@ess.ucla.edu Phone: +10-310-8256115, Fax: 
+10-310-2063051, corresponding author. 

^Departments of Mathematics and Earth and Atmospheric Sciences, Purdue University, W.Lafayette, 
IN 47907-2067, USA. E-mail: agabriel@math.purdue.edu 

^Institute of Geophysics and Planetary Physics and Department of Earth and Space Sciences, Uni- 
versity of California, Los Angeles, CA 90095-1567, USA, and International Institute of Earthquake Pre- 
diction Theory and Mathematical Geophysics, Russian Academy of Sciences, Moscow, Russia, E-mail: 
vkb@ess.ucla.edu 



1 



Multiscale Trend Analysis 



1 



Abstract 

This paper introduces a multiscale analysis based on optimal piecewise linear 
approximations of time series. An optimality criterion is formulated and on its base 
a computationally effective algorithm is constructed for decomposition of a time 
series into a hierarchy of trends (local linear approximations) at different scales. 
The top of the hierarchy is the global linear approximation over the whole obser- 
vational interval, the bottom is the original time series. Each internal level of the 
hierarchy corresponds to a piecewise linear approximation of analyzed series. Possi- 
ble applications of the introduced Multiscale Trend Analysis (MTA) go far beyond 
the linear interpolation problem: This paper develops and illustrates methods of 
self-affine, hierarchical, and correlation analyses of time series. 

Key words: multiscale trend analysis, piecewise linear approximation, hierarchical 
scaling. 

1 Introduction 

The motivation for the Multiscale Trend Analysis (MTA) introduced in this paper 
is to describe and analyze time series in terms of their observed trends (local linear 
approximations). Indeed, trends are the most intuitive feature of a time series and it 
seems natural to use them for series quantitative description. Such a description is 
intrinsically multiscale since each non-trivial process exhibits juxtaposition of trends of 
different duration and steepness depending on the observational scale. 

The proposed analysis is based on piecewise linear approximations of the analyzed 
time series. Construction of such approximations involves a tradeoff between quality and 
detail. We formulate (see Sect. 12 .Hj) a local optimality criterion and use it in a multiscale 
fashion to detect local trends in a time series at all possible scales, thus forming a hierarchy 
of trends. This hierarchy serves as a unique representation of the original time series and 
is used for quantitative analysis. 

The problem of piecewise interpolation of time series has been given significant atten- 
tion in the context of image processing (see for example jUEllH])- However, the focus was 
on constructing an optimal piecewise linear approximation L e (t) with minimal number 
of segments for given error e (deviation from the original signal). On the contrary, we 
concentrate on finding a whole hierarchy of consecutively more detailed approximations. 

This paper illustrates the following applications of MTA: 

• Descriptive and exploratory data analysis. Computationally effective trend decom- 
position naturally complements a standard data miner's toolbox. Conveniently, 
MTA does not rely on any assumptions about the analyzed time series (e.g. sta- 
tionarity or existence of higher moments) while its results are easily interpreted 

• Self-affine analysis. Particularly, MTA provides a way to extract local fractal prop- 
erties of the processes. 
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• Hierarchical analysis. Representation of a time series as a hierarchy (tree) allows 
one to use methods borrowed from the theory of hierarchical scaling complexities jlj. 
Particularly, Horton-Strahler indexing provides a natural way to consider scaling 
laws for trends. 

• Correlation analysis. MTA allows one to detect non-linear correlations, particularly 
those caused by the presence of amplitude modulation and non-linear long-term 
trends. 

The paper is organized as follows: Section El introduces the basic notions and de- 
scribes the computational algorithm for decomposition of a series into a hierarchy of 
trends. Methods of MTA-based self-affine analysis comprise Sect. El Section 0] intro- 
duces hierarchical analysis of time series. Correlation analysis is described in Sect. El 
Fractional Brownian walks and Mandelbrot cascade measures are used to illustrate meth- 
ods of Sect. El -El Section El concludes. 

2 Multiscale Trend Decomposition 

The core of the MTA is construction of a hierarchical tree Tx that describes the 
trend structure of a given time series X(t) . Trend is defined here as a linear least square 
approximation of X(t) at a subinterval of the observational time interval. The tree Tx 
is formed step-by-step, from the largest to the smallest scales: First, we determine the 
longer trends, then look for the shorter and shorter trends against the background of 
already established ones, all the way down the hierarchy of scales. The larger the scale at 
which the trend is observed, the higher the level of the corresponding vertex within the 
tree. The root (top vertex) of the resulting tree Tx corresponds to the global linear trend 
of X(t); each internal vertex corresponds to a distinct local trend, the leaves (vertices 
with no descendants) to the the elementary linear segments of the original time series 
X(t): [X(ti), X(t i+ i)]. The union of leaves thus coincides with X(t). 
A recursive procedure for constructing the tree Tx is described below. 

2.1 Scheme of the decomposition 

Without loss of generality we presume that the time series X(t) is observed at a finite 
number of epochs within the time interval [0, 1]. At the first step the whole time series 
X(t), t G [0, 1] is approximated by a single trend — the linear least square fit L (t) 
(Fig. la). 

This trend forms the vertex v° at the level (the root) of the resulting hierarchical 
tree Tx (Fig. lc). It is also convenient to say that the root of Tx corresponds to the 
whole time interval [0, 1], and vice versa. At the next step we determine secondary 
trends on the background of the first global one. For this we consider the deviation 
X\(t) = X(t) — Lo(t), t G [0, 1] of X(t) from its linear trend Lo(t) and approximate it 
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by a piecewise linear (discontinuous) function L\(t) (Fig. lb). The most delicate part of 
the analysis — choosing the optimal number n° of segments for this approximation - 
is described below in Sect. 12.21 The approximation L\{t) results in partition of the time 
interval [0, 1] = 1° into n° nonoverlapping subintervals 1} = [tj, t] +1 ], i = l,...,n°, 
with t\ = 0, £^o + i = 1- The linear segments l}{t) that comprise Li(t) are determined 
by the least square fit of X(t) within corresponding subintervals. They form n° vertices 
v I , i = 1 , . . . , n° at level 1 of the tree Tx ■ The enclosures I\ C 1° are reflected in the 
structure of the tree Tx by the fact that the vertices corresponding to subintervals 1} are 
descendants of the root, which corresponds to 1°. 

Repeating the above procedure at arbitrary interval I\ from level 1 we form n\ ternary 
linear trends, each determined by the least square fit of X(t) at a subinterval I 2 C 
j = 1, . . . , n\. The union of N 2 = Yh=x n ] snch trends descending from all the trends of 
level 1 form level 2 of the tree Tx- To index the vertices (local trends) at level 2 we use 
the natural ordering induced by the corresponding time partition: v 2 {If) denotes the 



t 2 t 2 



vertex (trend) that corresponds to the time subinterval If 

Repeating the same procedure at each time interval of level I, I > we form level 
(I + 1). It consists of 



Ni = E «i 

i=l 

subintervals (vertices). By construction, N = 1 and N k < N p for k < p. We depth of 
the resulting tree is denoted by L. 

Each level I of the tree Tx corresponds to a piecewise linear approximation Li(t) 
of the time series X(t) as well as to the induced partition I 1 = {/•, i — 1, . . . ,NA of 
the observational interval 1°. The global piecewise linear approximation Li(t) at level 



Ni, and 



I is a union of local linear approximations l\(t), t e l\ 

i° = utiA v/. 

By r\ we denote the length of subinterval i], and by e- the rms deviation of X(t) from 
its linear fit l\(t) at this subinterval: 



4 e (*(*) - m 



The total fitting error E\ at the level I is given by 

Ni _ „ 

^ = EW =E(^(*)-M*)) • (2) 

i=l te /o 

All vertices (subintervals) at a given level of Tx result from the same number of 
divisions of the initial interval [0, 1]. However, in many applications it is desirable to work 
with approximations characterized by a similar scale of observed trends, independently of 
the division history. To take this into account we consider the modified tree Mx obtained 
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from Tx by the following procedure. The first two levels of Mx are the same as that of 
Tx- Each consecutive level is formed by division of only one of the existing subtrends 
and leaving all the other unchanged. A subtrend v\ to be divided corresponds to the 

maximal improvement of the fitting quality A = — J2 ( e c +1 ) 5 where c runs over the 
indexes of children of the vertex i. We will call Tx the topological and Mx the metric 
tree associated with the series X(t). To avoid excessive notations we will use the same 
indexing for both the trees Tx and Mx stating each time which one is considered. 



2.2 Optimal piecewise linear approximation 

Here we describe a procedure for finding the optimal piecewise linear approximation 
L(t) of a series X(t) at a given time interval. Without loss of generality we suppose that 
the interval is [0, 1]. The problem, of course, is in finding the optimal tradeoff between 
the number N of linear segments within L(t) and the corresponding fitting quality E. 
Clearly, the larger the number N, the better the resulting fit. Our goal is to depict by 
linear segments only the most prominent large-scale trends of X(t) leaving the smaller 
fluctuations for the later steps of the decomposition. To solve this problem we employ 
the function 

W ) = -M^f), (3) 

where Eq is the fitting error of the global linear approximation Lo(t) of X(t) on [0, 1]. 
This function measures the quality of a piecewise linear approximation L(t; N, E) which 
consists of N linear segments and has total fitting error E. The optimal approximation 
L(t; N*, E*) corresponds to the maximum of H(N, E): 

H(N*,E*)=m^H(N,E). (4) 

Geometrically, consider the plane (N,\og(E/ E )), N being the number of linear seg- 
ments within a piecewise linear approximation of X(t) on [0, 1], and E the total fitting 
error. The global linear approximation Lo(t) at the whole interval [0, 1] corresponds to 
the point po = (1,0). An arbitrary piecewise approximation Li(t) corresponds to the 
point Pi = (N i ,\og(E i /E )), iVj > 1, E^ < E . The slope of the linear segment [po>Pi] 
shows the increase of the fitting quality per one additional segment of approximation. 
By the criterion (|3I4|) we chose the approximation with the maximal quality increase. 

With the above criterion (13141) one can find the optimal approximation by a full search 
over all possible partitions of [0, 1] by epochs of X(t) into N = 2,3,... subintervals. 
However, the computational complexity of such an approach depends exponentially on 
the number of observations so it can hardly be used in practice. In Sect. 12.31 below we 
introduce an optimized search based on the idea that partition epochs should correspond 
to the prominent edges of the analyzed series X(t). 
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2.3 Optimized search 

The idea of the optimized search is to reasonably reduce the set of possible partition 
epochs by considering only those at which X(t) significantly changes its slope — edge 
points. 

The edge points are determined by the following recursive procedure illustrated in 
Fig. 2. 

At the first step we choose the epochs (ti,t 2 ) corresponding to the maximum and 
minimum of the detrended function X\(t) = X(t) — Lo(t), where Lo(t) is a least-square 
linear fit of X(t) in [0, 1]. If one of these epochs coincides with the interval boundary 
(say, ti = 0) only the remaining epoch (£2) is considered. If both these epochs coincide 
with the interval boundaries, we redefine Lo(t) as the line connecting X(0) and X(l) 
and repeat the procedure. As a result we have one or two partition epochs within the 
initial interval; they divide it into two or three subintervals respectively. The procedure 
is now repeated for each of these subintervals, producing two to six new partition epochs. 
Together with already selected ones, they divide the initial interval into, respectively, 
four to nine subintervals, etc. The partition stops when the predefined number (Nh — 1) 
of partition epochs is collected; this corresponds to Nh subintervals. 

With (Nh — 1) possible partition epochs there are (2 Nh ~ l — lj ways to divide the 
interval into 2, . . . , Nh subintervals. The optimal — according to (13141) — partition can 
be found by (2 Nh ~ 1 — 2j operations. 

To further reduce the computation volume, we first choose the optimal one from 
(Nh — 1) partitions formed by (Nh — 2) partition epochs. Next, only the (Nh — 2) epochs 
that form this partition are used to find the optimal partition with (Nh — 3) partition 
epochs, etc. Finally, we use criterion (|3I4|) to choose the optimal from (Nh — l) partitions, 
each having a distinct number of subintervals ranging from 2 to Nh. This way we reduce 
the number of operations to (N% — Nh — 2)/2. 

Clearly, the above optimization may produce a piecewise function which does not co- 
incide with the optimal one resulting from applying the criterion (l-'jl4l) to the whole variety 
of possible partitions. As such, this optimization should be considered as a computation- 
ally effective approximation of the result. Extensive numerical experiments show that 
it is reasonably good for a wide range of time series including fractional Brownian mo- 
tions with different Hausdorff measures and self-affine processes coming from geophysical 
observations. 

2.4 Examples 

Here we show some examples and illustrate different ways to visualize the results of 
the decomposition. 

Figure 3 shows four levels, I = 0, 1,2, and 10 of tree Mx for a fractional Brownian 
walk with Hausdorff measure Ha = 0.7. Panel a) shows the analysed series X(t) and 
the piecewise linear approximations Li(t), I = 0, 1,2, 10, while panel b) shows the four 
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corresponding levels of the tree Mx- 

One can see how the fitting quality improves with the number of linear segments: 
each consecutive approximation tries to account for the most prominent variations of 
X(t) adding the least possible number of new segments. For example, starting with the 
three segments of the decomposition L\(t) at level 1, it is clearly more efficient to improve 
the leftmost segment, which exhibits large deviations around t = 0.1, than work with the 
central or rightmost one. When work is done with the largest deviation (see level 2) we 
proceed to the smaller ones. 

The function shown in Fig. 4a on the background of its tree Mx is a sum of three 
sinusoids with different frequencies. 

The amplitudes are chosen in such a way that the largest fluctuations are carried at 
the smallest frequency, intermediate at the second largest, and smallest at the highest 
one. This structure is clearly depicted by the decomposition with each separate level 
responsible for a distinct frequency (see panels b), c), and d)). 

Two more examples are given in Fig. 5 where we show only the signals X(t) and the 
upper levels of their trees Mx, which is enough to understand the shape of corresponding 
piecewise linear approximations. Decomposition for the famous Devil's Staircase is shown 
in Fig. 5a: it gives the exact description of the staircase structure. Figure 5b shows a 
decomposition for modulated oscillations with time-dependent frequency. Contrary to 
the panel a) here we use color-code to depict slope changes (from downward to upward 
or vice versa), not their directions. In this example one can see how the amplitude of 
oscillation is reflected in the decomposition: the higher the amplitude, the higher the 
level at which it is first detected. 

2.5 On the numerical parameter Nh 

The only numerical parameter of our algorithm is the maximal number Nh of 
secondary trends (see Sect. 12. 3|) . Large values of Nh contribute to the computational 
complexity, while small values may prevent fast detection of optimal approximation and 
create superfluous levels of the hierarchy Tx- Numerous experiments suggest the value 
Nh = 5 as the optimal tradeoff, and we use it for all experiments presented in this paper. 

Clearly, with Nh = 5 we are not insured from creating unnecessary levels. For example 
the division of Fig. 4b consists of 6 (> Nh = 5) linear segments, so it could not be 
obtained by a single division of the original series. In fact this is level 2 of the original 
hierarchy Mx- Analogously, the intermediate division of Fig. 4a (see also Fig. 4c) 
corresponds to level 19, and the bottom one (Fig. 4d) to level 82. 

The simple procedure used to remove unnecessary levels is illustrated in Fig. 6 where 
we show the fitting error Ei/Eq for all levels I of the tree Mx constructed for the signal 
of Fig. 4a. The prominent edge points show the three levels at which saturation of the 
fitting quality is reached; only these three levels are left in Fig. 4a. 

If the analyzed tree has only less-than-5-fold partitions (which is the case for the 
Devil's Staircase of Fig. 5a) the above procedure is unnecessary. The properties of this 
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procedure and conditions for its use are beyond the scope of the present paper. 



3 Self-affine analysis 



In this section we demonstrate how self-affine properties of a time series are reflected 
in its decomposition Mx- 

Recall [SHE] that statistical properties of a self-affine time series X(t) remain the same 
under the transformation 

it' = rt 
) X' = r Ha X. 



Hav (5) 



That is, when one changes the observational time scale by a factor of r, the scale of 
measurements should be changed by a factor of r Ha in order to preserve the characteristic 
statistical features of X(t). The parameter Ha is called Hausdorff measure; it is related 
to the fractal dimension D of a self-affine time series as Ha = 2 — D |E] . Accordingly, for 
one-dimensional time series the Hausdorff measure may take values within the range < 
Ha < 1. A useful interpretation of Ha comes from the character of correlations between 
the time series increments: Aj = X(ti) — Negative correlations between Aj and 

Aj + i lead to high fluctuations of X(t) and as a result to absence of pronounced trends; 
this situation corresponds to small values of Hausdorff measure: Ha < 1/2. Positive 
correlations — leading to existence of long-term trends — correspond to Ha > 1/2. For 
a process with independent increments (e.g. Brownian walk) one has Ha = 1/2. 

To estimate the Hausdorff measure of observed time series one typically considers the 
dependence of a convenient measure of its variation on the length of a corresponding 
observational interval El • in our case the appropriate variation measure can be chosen 
as the fitting error E\ (J2J) of Mx at level I. According to (JHJ), for a self-affine series X(t) 
we expect to observe a power-law relation 

§- = N r Ha = R? a , (6) 

where iVj is the number of segments at the level I, Ri = Nf 1 is their mean length. 

As a model example we consider fractional Brownian walks (FBWs) with Hausdorff 
measures in the range < Ha < 1. 

Figure 7a shows trajectories and the corresponding (iVj, ^)-scalings for three FBWs 
with Ha = 0.1,0.5, and 0.9. Figure 7b shows the value b(Ha) estimated by the best 
linear square fit from the relation 

log(E l /E ) = -blog(N l ) (7) 

based on decomposition of 2100 independent FBWs; to remove statistical fluctuations 
we averaged b over 100 FBWs for each value of Ha. As seen in Fig. 7b, the scaling (JHjl 
clearly holds for Ha > 0.3; the deviations observed at the smaller values of Ha are due 
to the fact that the corresponding FBWs become noisier and hardly display pronounced 
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trends. This effect is typical for self-affine analysis (e.g., see J7J). To neglect it we consider 
the integrated signal Y(t) = Y, s <tX(s). Estimations of the slope b(Ha) for integrated 
FBWs are presented in Fig. 7c. The linear relation b(Ha) = Ha + 1 is now observed 
for < Ha < 0.6, the change of slope compared to Fig. 7b is due to the integration 
procedure. 

Another way to estimate Ha is to consider the error-length dependence for all indi- 
vidual linear segments comprising Mx'- 

e- / ,\Ha+l/2 , 

-f = (rO , Z = 1,...,L, i = l,. (8) 

The difference in the power exponents of relations (JHJ) and (JSJ) is explained by the fact 
that the former deals with averaged statistics, while the latter deals with characteristics 
of individual intervals. Figure 8 illustrates the error-length dependence (JSJ) for FBWs 
with Ha = 0.1 and Ha = 0.9. 

Importantly, MTA provides a convenient basis for estimation of local Hausdorff mea- 
sures Ha(t). Consider all the intervals from Tx that cover epoch t. At each level / of 
Tx there is one and only one such interval; we use the index L to denote this interval 
and all its characteristics. The local Hausdorff measure Ha(t) is estimated now from the 
relation 

e\ t \ t , \Ha(t)+l/2 , 

f = (rfo) ,( = 1 L. (9) 

Figures 9a,b show the dynamics of the local Hausdorff measure for multi- and monofrac- 
tals. We use a Mandelbrot cascade measure M(0.7, 0.3; 0.3, 0.7) as a model example of 
a multifractal (Fig. 9c), and a Brownian walk as that of a monofractal (Fig. 9d). The 
definition of Mandelbrot cascade measure is given in Appendix El Note that the range 
of Half) variation for the monofractal (Fig. 9b) is an order of magnitude less than that 
for the multifractal (Fig. 9a). 

The points (e^, r^J used in © to estimate the local Hausdorff measure are ex- 
tracted from the whole set (e\, r l A of (JSJ). This suggests a method for detecting multi- 

fractality in X(t): the larger the scattering of the points (e\, r'J, the larger the prob- 
ability that the observed series is a multifractal. Formal statistical tests can be easily 
constructed from this general principle based on the particular problem at hand. The 
character of temporal variations of Ha(t) (Figs. 9a,b) can be also used in such tests. An 
example of the scattering (e\, rj) is shown in Fig. 10 for mono- and multifractals of Fig. 
9. In this model example the difference is obvious. 

4 Hierarchical scaling 

The appropriate ordering of vertices within a tree Tx is very important for meaningful 
description and analysis of the series X(t). The problem of such an ordering becomes 
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not trivial as soon as the tree is not uniform (i.e. is not formed by applying the same 
deterministic division rule to each of its vertices). A befitting way to solve this problem 
is given by the Horton-Strahler topological classification of ramified patterns jU El E] 
illustrated in Fig. 11: One assigns orders to the vertices of the tree, starting from k — 1 
at leaves (vertices with no descendants). 

The order of an internal vertex equals the maximal order m of its descendants, if they 
are distinct, and m + 1 if they are all equal. Originally introduced in geomorphology by 
Horton jS] and later refined by Strahler this classification is shown to be inherent in 
various geophysical, biological, and computational applications |H [TUJ HH E2] ■ 

As a result of the Horton-Strahler indexing of the tree Tx, each of its vertices is 
characterized by an order k, length r of the corresponding partition interval, and the 
error e of the linear least square fit of X(t) on this interval. The scaling behavior of X(t) 
can be described by the exponents of the relations: 

N(k) ~ 10- BNk ; R{k) ~ 10 BRk ; E{k) ~ 10 BEfc . (10) 

Here N(k) is the number of vertices of order k, R(k) and E(k) are the values of r and e 
averaged over the vertices of order k. 

The relation between the number N(k) of vertices of order k and their average length 
R(k) determines the fractal dimension d of the tree Tx |10j : 

N(k)=R(k)- d . (11) 

Combining (fTUj) and (jllj) we find: 

(12) 

The structure of the tree T x can be considered at different levels of detail: First, one 
can consider only the topological structure (Fig. 12a), where the position of each vertex 
is uniquely determined by its parent (the nearest vertex placed closer to the root); and 
any permutation of siblings (the vertices with the same parent) does not change the tree. 
Each vertex is characterized by its Horton-Strahler index, and the only constraint on a 
tree resulting from MTA is the maximal possible number of siblings, that is subtrends 
within a given trend. Next, one can add the information on interval partition (Fig. 12b): 
The siblings become ordered according to the partition of the interval corresponding 
to their parent. Each vertex Vi is additionally characterized by the length r*j and the 
following conservation law holds: 

r i = J2 r ^ ( 13 ) 

where c runs over the indexes of the children of the element i. 

Finally, (Fig. 12c) one considers error characteristics e^, which describe the quality 
of the linear fit of X(t) within the corresponding time interval. In terms of these errors 
the system becomes dissipative: 

e; > e « ( 14 ) 



Multiscale Trend Analysis 



10 



with the same meaning of subindexes as in ()13j) . 

The exponents B^^e of (fTUj) reflect different statistical properties of the tree Tx'- 
Bn describes its topological structure while Br and Be relate to the metric structures 
based, respectively, on properties of interval partition (r-metric) and piecewise linear fit 
(e-metric) . 

For illustration we again use FBWs with different Hausdorff measures. 

Figure 13 shows the dependence of the exponents B^,l,e on the Hausdorff measure 
< Ha < 1. The estimations are averaged over 100 FBWs for each value of Ha. The 
exponents B^ and Br are nearly constant: ~ 0.52, Br w 0.57, while for the exponent 
B E we observe the linear dependence: 

B E = 0.7 + Ha « log 10 (5) + Ha. (15) 

These results have an important interpretation: All FBWs with Hausdorff measure 
in the range < Ha < 1 have the same topological and r-metric structures in terms of 
MTA tree Tx- Particularly, trees Tx corresponding to different Ha have the same fractal 
dimension d = B^/Br pa 0.9. The only characteristic that depends on the Hausdorff 
measure is the fitting error (e-metric), that is the degree of variation of X(t) within a 
given interval. 

5 Correlation analysis 

One of the important applications of MTA is correlation analysis of time series. 
The major drawback of classical correlation analysis is that interpretation of its results 
may be completely ruined by the presence of long-term trends and/or modest amplitude 
modulations of signals. The MTA can naturally avoid these problems by depicting the 
essential local properties of the analyzed series. 

We start this section by introducing two measures of similarity for time series. One 
is based solely on the time interval partition induced by Mx', another takes into account 
the directions (upward vs. downward) of local trends. 

5.1 Distance between partitions 

Each level / of the tree Mx (Sect. |2J) corresponds to a partition of the time interval 
[0, 1] into Ni nonoverlapping subintervals. Since each of these subintervals corresponds 
to a distinct observed trend of the series X(t), the problem of comparison of two such 
partitions naturally arises. Below we introduce the distance between two partitions. 

Consider the space Q of finite partitions of the unit interval [0, 1]. Each partition 
A is defined by a finite number of points; the boundaries and 1 are included in all 
partitions: 

A = {0 = a < a x < . . . < a nA < a nA+1 = 1}. 
The trivial partition U consists only of boundary points: U = {0, 1}. 
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For A, B G Q we say that B is a subpartition of A {B C A) if all points from A are 
among points from B; this imposes a partial order on Q. A union A U B is defined as 
the partition consisting of the points included in either A or B, without repetitions. An 
intersection A (IB is defined as the partition consisting of points included in both A and 
B. 

An asymmetric distance m(A, B) from A to B (A, B 6 Q) can be defined as 

n A 

m{A,B) mm {loi-ftjl}, (16) 

which gives for the trivial partition 

m(A, [/) = m(v4) = ^ min{aj, 1 — aj} 

8=1 

The distance ()16|) is interpreted as the minimal correction to A that makes S its subpar- 
tition: B C A', where A' stands for the corrected version of A. 

The following properties of m(A, B) follow directly from the definition (|16j) : 

1. < m(A,B) < oo; 

2. m(A,B) = iff B C A; 

3. Additivity with respect to A: m(A 1 U A 2 , B) = m(Ai, B) + m(A 2 , B); 

4. Monotonicity with respect to B (the triangle inequality): m(A, BiUB 2 ) < m(A, B\) + 
m{A,B 2 ). 

It is convenient to consider the symmetric function 

/i(v4, B) = max{m(A, B),m{B, A)}, (17) 

whose small values signal that the partitions A and B are similar. Note that fi is not a 
distance since it does not satisfy the triangle inequality. The reciprocal /i -1 may serve 
as a measure of partition correlation. 

5.2 Slope sign correlation 

Here we introduce the correlation function that describe similarity between two 
piecewise linear approximations L 1 ^) and L 2 (t) of X(t), t 6 [0, 1]. (We use upper 
indexes in order not to mix these arbitrary approximations with Li(t), and L 2 (t) at the 
first and second levels of the decomposition.) This correlation function is based on the 
coarse information about trends from L l (t): We take into account only their directions 
— upward vs. downward. 

First, we introduce the signed partitions P\ and P 2 of the interval [0, 1]. They are 
formed by the intervals of constant sign of the slope of L l (t), i — 1,2 (see Fig. 14a). 
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A subinterval from Pi is assigned the sign "+" if the corresponding trend of L l (t) is 
upward, and "-" if it is downward. Second, we define the signed partition P as a union of 
Pi, i = 1, 2 with the signs determined by multiplication of the signs of the corresponding 
subintervals from (Fig. 14b). As a result, the positive intervals of P correspond to 
matching (up to direction) trends of L 1 and L 2 , while negative to unmatching ones. 

Each subinterval I of the partition P is formed by intersection of two subintervals 
Ii G Pi, i = 1,2; two general variants of such an intersection are shown in Fig. 14c. A 
subinterval / is assigned a triplet (a, b, c) defined as shown in Fig. 14c: b is the length 
of the intersection Ii ("1/2, while a and c are the lengths of those parts of Ii that are not 
included in the intersection. The triplet is normalized: a + b + c = 1. It describes how 
good is the matching of intervals lo the meaning of b is clear; the best matching for a 
given b corresponds to the case when the intervals' ends coincide, that is to a • c = 0. 
The matching quality can be reflected in the weight 

(1-6) log(l-fr) (a + c) log(o + c) 

w — 1 — r~\ 1 — rr = 1 — r~\ 1 — rTi (1°) 

a log (a) + c log(c) a log(aJ + c log(c) 

lying within the range < w < 1. 

The correlation function r(L x ,L 2 ) is now defined as 

r(L 1 ,L 2 )=Y,r k -w k . (19) 

k 

Here the summation is taken over all the subintervals of the signed partition P; denotes 
the signed length of the kth subinterval, Wk is the corresponding weight (JTBJ). 

The measure (|T9"j) is intentionally crude: it does not distinguish between steepness 
of the trends. More elaborate correlations can be easily defined following the scheme 
outlined above. Nevertheless, as we show in Sect. 15.31 below, even the roughest measure 
(|16|) is very effective in detecting non-linear correlations. 

5.3 Examples 

This section illustrates applications of the correlation analysis in the presence of 
long-term nonlinear trends and amplitude modulations. 

5.3.1 Detection of correlation 

Figure 15 displays the trajectories of two processes Fi(t), i = 1,2 coupled by the 
common underlying phenomenon which — by and large — makes them change their 
intermediate-scale trends synchronically. The most striking similarity between Fi(t) is 
observed at the intervals [0,0.1] and [0.2,0.55]. Also we note the synchronous peaks 
around t = 0.675,0.775,0.975 (more pronounced for Fi(t).) At the same time, the 
coupling phenomenon is not a primary one in shaping the dynamics of Fi(t), so their 
overall outlooks are still quite dissimilar. In such situations one is interested in detection 
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and proper quantification of the observed non-linear coupling. The problem of such a 
quantification constitutes an important part of modern analysis of time series. 

MTA suggests an effective way of solving this problem by comparing the trend struc- 
tures of observed series at different scales. We decompose the observations Fi(t) into trees 
Mi and calculate the distance /i (fTTjl between different levels of these decompositions. 
The reciprocal /x" 1 of the distance between the signals Fi(t) is plotted as the function of 
the decomposition levels k, i — 1,2 in Fig. 16a. 

The diagonal ridge indicates pairs of levels with similar trend structures. The promi- 
nent upwell observed at the medium scales — 15 < l\ < 18, 14 < l 2 < 17 — signals 
that this range is responsible for the observed coupling. The maximum /i -1 = 4.6 cor- 
responds to the levels l\ = 15, h = 14; we will refer to them as levels of maximal 
correlation (LMC). The piecewise linear approximations of Fi(t) corresponding to the 
LMC are shown in Fig. 17. They clearly accentuate the observed coupling. 

A typical shape of for uncoupled time series is shown for comparison in Fig. 
16b. The diagonal ridge is still observed, though it is more blurred. Existence of such 
a ridge is explained by the fact that partitions with a similar number of segments, even 
non-matching ones, are closer to each other in the sense of (JT7jl than partitions with 
significantly different number of segments. Comparing Figs. 16a and b we conclude that 
the upwell observed in panel a is not a random one and is due to the correlation between 
the signals. A formal statistical test for establishing the significance of the observed 
peaks of \i can be easily constructed. 

5.3.2 Quantification of detected correlation 

As was shown in the previous section, MTA allows one to estimate non-linear correla- 
tions between signals; the value /i -1 may be considered as a measure of such correlation. 
Here we show how to evaluate the functional form of the coupling phenomenon respon- 
sible for the correlation detected. 

To pose the problem formally, suppose that the observations Fi(t), i = 1, 2 are formed 
by applying amplitude modulations Ai(t) and adding non-linear trends Ti(t) to the same 
base signal X(t): 

F(t) = Ai{t) ■ X(t) + Ti(t) +ti(t),i = 1, 2. (20) 

Here are measurement errors. In this model the correlation between signals Fi(t) is 
due totally to the X(t). The first problem is to reconstruct trends Tj(i) and modulated 
signals Ai(t) ■ X(t) given the observations Fi(t). Clearly, for reliable reconstruction one 
has to assume an appropriate rate of variation for the trends as well as a reasonably 
small noise-to-signal ratio. In practice, we assume that such conditions are satisfied if 
significant coupling has been detected by the correlation analysis of Sect. 15.3.11 

The idea of reconstruction is that the correlated parts A4 ■ X(t) should be described 
by the LMC of Mi (see Sect. 15.3.1)1 . Accordingly, the trends Ti(t) should be described 
by the higher-scale (less detailed) levels. 
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As a model example we again use the series of Fig. 15; in fact, they are produced by 
the model flUJ) with 

X(t) = sin (4007rf(t- 0.5)0 -0.7)0- !)) I 
Ti(t) = 5 sin (4vrt 3/2 ) ; 

T 2 (t) = -5 cos (27rf 3/2 ) ; 
= exp(2t); 

A 2 (t) = 2exp(-t/3). (21) 

The measurement errors £j(t) are modeled by independent Brownian walks so they also 
represent random drifts. The series Fi(t) together with their components (J21)) are shown 
in Fig. 18. 

The trends Ti(t) + £«(£) are estimated by the piecewise linear functions Tj, formed 
by the parents of the vertices at the LMC, l\ = 15, 1% = 14. In other words, each of 
the linear segments at the levels li should be formed by a single non-trivial partition 
of one of the trends of Tj. By single we mean that this is a one-time partition by the 
rules described in Sect. El by non-trivial — that each segment is divided into more than 
one subsegment. The modulated signals Ai(t) ■ X(t) are estimated then as AjXj(t) = 
(Fi(t)-f f (t)), 2 = 1,2. 

The quality of these estimations is illustrated in Fig. 19 where we show real vs. 
estimated modulated signals AiX^t). The estimations are almost perfect at the intervals 
[0,0.1] and [0.2,0.55], (cf. Fig. 15 and its discussion in Sect. 15.3.11 ) Generally, we catch 
well the oscillatory structure of the signals; that is their time-dependent frequencies and 
directions (upward vs. downward), while the amplitude estimation is less precise. 

The estimations of Fig. 19 can be further improved by means of various kernel smooth- 
ing techniques. MTA results can be used for optimization of the time-dependent kernel 
width. 

With additional assumptions about the rate of variation for Ai(t) one may pose the 
problem of reconstructing X(t) given two, or more, modulated versions Ai(t)-X(t). Using 
the epochs assigned to the summands of (jTUjl (say, a*), one may analyze time-dependent 
correlations within Fi(t). Clearly, the entire analysis can be repeated with the correlation 
(|19|) as a measure of trend similarity. 

6 Discussion 

The methods developed in this paper are based on the computational technique 
(see Sect. |2J) for solving the linear interpolation problem for time series. This problem 
includes two principal difficulties. The first is a fundamental one: a tradeoff between 
the quality of a possible approximation and its detail. The second difficulty is purely 
computational: There are (n — 2)!/(n — 1 — k)\(k + 1)! ways to construct a piecewise 
linear approximation with a given number k of segments and n observational epochs. 
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Clearly, the search for the optimum over all possible approximations is unacceptable for 
operational use, and computationally effective algorithms are to be invented. Here we 
resolve the first difficulty by introducing the optimality criterion (|3J4J) of Sect. 12 .2\ and 
the second by replacing the original time series with its "skeleton" that includes only 
the edge points defined in Sect. 12. HI The whole analysis is then done hierarchically, 
in a multiscale self-similar fashion. This contributes to computational efficiency as well 
as to the imprecision of the final result, since the errors made in the first steps of the 
decomposition may affect all the consecutive steps. It would therefore be interesting to 
study a) deviations of the MTA approximations from the optimal (in a squared deviation 
sense) piecewise linear approximations with the same number of segments, and b) the 
history of the first-step errors. 

The procedure for edge point detection is introduced here (Sect. 12 .3|) in its simplest 
(not to say most naive) form and is subject to further improvement. Nevertheless, even 
in its present form, the MTA has the potential to be an effective tool for solving a 
wide specrtrum of applied problems, ranging from exploratory data analysis to studying 
hierarchical scaling for time series. 

Recently, several techniques based on properties of local linear trends were proposed 
and studied. The Detrended Fluctuation Analysis (DFA) [7j was shown to be a powerful 
tool for multiscale analysis and interpretation of diverse medical and financial data. 
Contrary to our analysis, DFA uses a predefined interval partition scheme independent 
of the particular series at hand. It is oriented toward analysis of variations, rather than 
the trend structure itself. An alternative approach to the problem of detection of local 
linear trends is discussed in [T3|. 

The problem considered in this paper naturally extends to higher dimensions. How- 
ever, it is not clear how to apply the ideas developed here even to 2D and this issue 
deserves special attention. Interestingly, elegant theoretical results on rectifiable curves 
by P. Jones are tightly related to detection of linear structures in point clouds. Vari- 
ous methods of multiscale geometric analysis based on Jones' theory ( and references 
therein) use predefined (dyadic) partition schemes. It would be very important to find 
algorithms for fast linearization in point clouds. 

It is worth mentioning that the self-affine analysis of Sect. 01 may be done equally 
effectively by a multitude of techniques, and MTA is by no means claimed to be the most 
efficient one. We include this section in order to demonstrate the diversity of possible 
applications based on the single MTA decomposition of a time series. 
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A Mandelbrot cascade measures 

A Mandelbrot cascade measure Mfc, rrii), i — 1, . . . , n on the interval [0, 1] is constructed 
as follows. At step there is a unit mass distributed uniformly over the whole interval. 
At the first step we divide the interval [0, 1] into n subintervals of lengths r*, Y%=i r 'i — 1 
and assign to them masses nti, J27=i m i = 1- Within each interval the mass distribution 
is uniform. Next, we divide each subinterval % into n subsubintervals and assign to them 
uniform masses • rrij, j — 1, . . . , n, and so on. Therefore, at the kth step the interval 
[0, 1] is divided into n k subintervals, each carrying the uniform mass ■ . . . ■ m ik , with 
Zfc taken from the set 1, . . . , n with possible repetitions. 

Such measures were introduced first to model turbulent dissipation, and were studied 
by Mandelbrot [TEj . 
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Figure 1: Scheme of the Multiscale Trend Decomposition, a) At zero step X(t) is 
approximated by its global linear trend L (t). b) Detrended series Xi(t) = X{t) — L (t) 
is approximated by the piecewise linear function Li(t), the whole analysis is then repeated 
at each of subintervals [tj, t} +1 ). c) Resulting hierarchy of trends. See Sect. EJfor details. 
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a) b) c) 




Figure 2: Scheme of detection of edge points, a) At zero step X(t) is approximated by 
its global linear trend L (t). b) Epochs t 2 ) of global maximum and minimum of the 
detrended series Xi(t) = X(t) — L (t) are located, c) Analysis is repeated at each of 
subintervals [0,£i], [ti,^], and [t 2) !]• 
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a) MTD for Brownian walk 
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Figure 3: Decomposition of a Fractional Brownian walk with Hausdorff measure Ha = 
0.7. a) Piecewise linear approximations at levels I = 0, 1, 2, 10. b) Corresponding hierar- 
chical tree. 
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0.4 0.7 0.45 0.53 



Figure 4: Decomposition of the sum of three sinusoids, X{t) = sin(57rt) + |sin(607T7;) + 
sin(2007rf). a) X(t) on the background of three levels from its decomposition, b) Piece- 
wise linear approximation corresponding to the top level of the decomposition shown in 
panel a), c) Fragment corresponding to the middle level of a), d) Fragment corresponding 
to the bottom level of a). 
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Figure 5: Decomposition of a) Devil's Staircase (5 upper levels of Mj are shown) and b) 
modulated sinusoid with time-dependent frequency (15 levels are shown). 
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Figure 6: Illustration of removing unnecessary levels from the decomposition (for the 
signal shown in Fig. 4). Prominent saturation points correspond to the three levels 
shown in Fig. 4 
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Figure 7: Relation between Hausdorff measure and error scaling for fractional Brownian 
walks (FBW). a) Trajectories of FBWs with Ha = 0.1,0.5,0.9 and corresponding error 
scalings. b) Relation b(Ha) for FBWs, < Ha < 1, values of b averaged over 100 
realizations of FBW for each value of Ha. c) The same as b) for integrated FBWs. 



Multiscale Trend Analysis 



25 




Figure 8: Error-length dependence for individual vertices of trees Mx corresponding to 
FBW with Ha = 0.1,0.9. The scaling (jHJ) is clearly observed. 
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a) b) 




Time Time 

Figure 9: Estimation of local Hausdorff measures, Ha(t) for a multifractal (Mandelbrot 
cascade measure) (panel a) and monofractal (Brownian walk) (panel b). Corresponding 
time series are shown in panels c) (multifractal) and d) (monofractal). 
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Figure 10: Error-length dependence for multi- and monofractals of Fig. 9. Note that the 
point scattering is significantly larger for the multifractal. 
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Figure 11: Horton-Strahler indexing. 
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Figure 12: Three levels of detail in MTA description of a time series, a) Topological, 
b) r-metric, based on the interval partition, c) e-metric, based on local linear fit of the 
series. See details in Sect. |U 
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Figure 13: Dependence of scaling exponents Be,r,n (Eq. (fTU|) ) on the Hausdorff measure 
Ha of FBWs. Dashed line is B = 0.7 + #a. 
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Figure 14: Signed partition corresponding to a piecewise linear approximation (panel 
a), union of signed partitions (panel b), and triplet (a, b, c) for an interval of a union of 
partitions, see Sect. 15.21 
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Figure 15: Two signals coupled by an unobserved phenomenon. The signals tend to 
change their intermediate trends synchronously, while their overall shapes are different. 
The striking similarity is observed at intervals [0,0.1] and [0.2,0.55]. Note also the 
common peaks at t — 0.675, 0.775, 0.975. See details in Sect. 15.3.11 
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Figure 16: Correlation (reciprocal distance) \x (fTTj) between two signals shown in Fig. 15 
(panel a) and two independent Brownian walks (panel b). 
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Figure 17: Piecewise linear approximations L\i = 1,2 of the signals Fi(t) from Fig. 15 
at the levels of maximal correlation (Zi = 15, l 2 = 14). These approximations depict the 
intermediate-scale variations responsible for the signals' coupling. 
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Figure 18: Structure of the signals Fi(t),i = 1,2 shown in Fig. 15. a),e) Original signals 
Fi(t). b),f) Coupling parts A^t) ■ X(t). c),g) Non-linear deterministic trends. d),h) 
Random drifts. 
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Figure 19: Reconstruction (solid lines) of the coupling parts Ai(t) ■ X(t) (dashed lines). 
See Sect. 15.3. II for discussion. 
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